DBOS runtime by adrianlyjak · Pull Request #157 · run-llama/workflows-py

adrianlyjak · 2025-10-21T23:47:42Z

Removes the broker
Splits up the context into three separate components, hidden behind the existing ctx interface
extends plugins and renames them to runtimes
still in progress dbos integration as an alternate runtime

packages/llama-index-workflows/src/workflows/plugins/dbos.py

changeset-bot · 2026-01-15T23:32:32Z

⚠️ No Changeset found

Latest commit: 84395fc

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coveralls · 2026-01-15T23:34:24Z

Pull Request Test Coverage Report for Build 21671598072

Details

515 of 605 (85.12%) changed or added relevant lines in 6 files are covered.
3 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.5%) to 81.874%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
packages/llama-agents-dbos/src/llama_agents/dbos/journal/task_journal.py	38	46	82.61%
packages/llama-agents-dbos/src/llama_agents/dbos/runtime.py	239	268	89.18%
packages/llama-agents-dbos/src/llama_agents/dbos/state_store.py	203	256	79.3%

Files with Coverage Reduction	New Missed Lines	%
workflows/context/state_store.py	3	89.05%

Totals
Change from base Build 21649211948:	0.5%
Covered Lines:	7715
Relevant Lines:	9423

💛 - Coveralls

devin-ai-integration

Devin Review found 4 potential issues.

View issues and 14 additional flags in Devin Review.

packages/llama-index-workflows/src/workflows/server/server.py

packages/llama-index-workflows/src/workflows/plugins/basic.py

packages/llama-index-workflows/src/workflows/context/internal_context.py

devin-ai-integration · 2026-01-22T21:26:05Z

packages/llama-index-workflows/src/workflows/server/server.py

    @property
    def status(self) -> Status:
-        """Get the current status by inspecting the handler state."""
-        if not self.run_handler.done():
+        """Get the current status by inspecting the terminal event or handler state.
+
+        Status is derived from the terminal event type when available:
+        - WorkflowCancelledEvent -> "cancelled"
+        - WorkflowTimedOutEvent -> "failed" (timeout is a failure mode)
+        - WorkflowFailedEvent -> "failed"
+        - Plain StopEvent -> "completed"
+
+        Falls back to checking handler state if no terminal event yet.
+        """
+        # First check if we have a terminal event - derive status from event type
+        if self._terminal_event is not None:
+            if isinstance(self._terminal_event, WorkflowCancelledEvent):
+                return "cancelled"
+            elif isinstance(self._terminal_event, WorkflowTimedOutEvent):
+                return "failed"
+            elif isinstance(self._terminal_event, WorkflowFailedEvent):
+                return "failed"
+            else:
+                return "completed"
+
+        # Fall back to handler state check if no terminal event yet
+        if not self.run_handler.is_done():
            return "running"
-        # done - check if cancelled first
-        if self.run_handler.cancelled():
-            return "cancelled"
-        # then check for exception
-        exc = self.run_handler.exception()
-        if exc is not None:
-            return "failed"
-        return "completed"
+        # If handler is done but we don't have a terminal event, it was likely
+        # cancelled externally or failed before emitting a terminal event
+        return "running"


🟡 WorkflowServer status can report "running" even after handler completion if no terminal event was observed

In _WorkflowHandler.status, if no terminal event was recorded and run_handler.is_done() is true, the code returns "running" unconditionally.

Actual behavior: completed/failed/cancelled runs can be reported as "running" in persistence/API if the terminal StopEvent was not observed/recorded by _stream_events for any reason.

Expected behavior: if the handler is done and there is no terminal event, the status should be derived from handler completion state (cancelled vs exception vs completed), not forced to "running".

Code: workflows/server/server.py:1685-1713

Recommendation: When run_handler.is_done() is true and _terminal_event is None, fall back to run_handler.cancelled() / run_handler.exception() to classify as cancelled/failed/completed (or at least "failed"/"cancelled"), rather than returning "running".

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration

Devin Review found 1 new potential issue.

View issue and 19 additional flags in Devin Review.

devin-ai-integration · 2026-01-22T22:25:22Z

packages/llama-index-workflows/src/workflows/plugins/basic.py

+    def __init__(self) -> None:
+        self._queues: dict[str, AsyncioAdapterQueues] = {}
+        self._max_concurrent_runs: weakref.WeakValueDictionary[
+            str, asyncio.Semaphore
+        ] = weakref.WeakValueDictionary()


🔴 BasicRuntime concurrency limiting can silently stop working due to WeakValueDictionary semaphore storage

BasicRuntime stores per-workflow semaphores in a weakref.WeakValueDictionary:

self._max_concurrent_runs: weakref.WeakValueDictionary[str, asyncio.Semaphore]

(basic.py:184-188).

Because the only long-lived reference to each semaphore is the weak dictionary entry, semaphores may be garbage-collected at any time when not currently being awaited. When that happens, the next call to _maybe_acquire_max_concurrent_runs() will create a new semaphore, effectively resetting concurrency limits and allowing more than the configured num_concurrent_runs.

Actual: concurrency limit can be bypassed intermittently/non-deterministically.
Expected: concurrency limit should be enforced consistently for the process lifetime (or at least until runtime.destroy()).

Impact: can exceed intended concurrency caps, causing resource exhaustion and incorrect load-shedding behavior.

Recommendation: Use a normal dict (strong refs) for _max_concurrent_runs, and clear it in destroy(); or otherwise keep strong references for semaphore lifetime management.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration

Devin Review found 1 new potential issue.

View issue and 31 additional flags in Devin Review.

devin-ai-integration · 2026-01-22T23:21:08Z

packages/llama-index-workflows/src/workflows/server/server.py

+    async def cancel_handlers_and_tasks(self, *, graceful: bool = True) -> None:
+        """Cancel the handler and release it from the store.
+
+        Args:
+            graceful: If True, request graceful cancellation and wait for
+                WorkflowCancelledEvent. If False, force immediate cancellation
+                (used for idle release where we don't want to emit cancel event).
+        """
+        if not self.run_handler.is_done():
+            if graceful:
+                try:
+                    # Request graceful cancellation - this will emit WorkflowCancelledEvent
+                    await self.run_handler.cancel_run()
+                except Exception:
+                    pass
+                try:
+                    # Wait for the workflow to complete after cancellation
+                    # This gives time for WorkflowCancelledEvent to be emitted
+                    await asyncio.wait_for(self.run_handler, timeout=2.0)
+                except asyncio.TimeoutError:
+                    # Force cancel if graceful cancellation didn't complete in time
+                    self.run_handler.cancel()
+                except asyncio.CancelledError:
+                    pass
+                except Exception:
+                    pass
+            else:
+                # Force immediate cancellation without waiting
+                try:
+                    await self.run_handler.cancel_run()
+                except Exception:
+                    pass
+                try:
+                    self.run_handler.cancel()
+                except Exception:
+                    pass


🟡 Idle release uses graceful cancellation even when graceful=False, contradicting intended semantics

_WorkflowHandler.cancel_handlers_and_tasks(graceful=False) is documented/used for idle release “where we don't want to emit cancel event”, but it still calls await self.run_handler.cancel_run() before hard-cancelling.

Actual: idle release sends a TickCancelRun into the workflow (cancel_run()), which can cause the workflow to emit WorkflowCancelledEvent and transition persisted status to cancelled.
Expected: idle release should stop in-memory execution without changing logical workflow outcome (it should remain resumable/running), or at least avoid emitting cancellation signals.

Code:

In non-graceful branch:

await self.run_handler.cancel_run() ... self.run_handler.cancel()

workflows/server/server.py:1937-1945

Impact: idle release may incorrectly cancel runs instead of just unloading them, breaking resumability and causing incorrect persisted status.

Recommendation: For graceful=False, avoid calling cancel_run(); instead only stop the streaming task and close adapter/resources (or implement a runtime-specific ‘detach/unload’ that doesn’t enqueue TickCancelRun).

Was this helpful? React with 👍 or 👎 to provide feedback.

review-notebook-app · 2026-02-02T22:00:27Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

devin-ai-integration

Devin Review found 1 new potential issue.

View issue and 36 additional flags in Devin Review.

devin-ai-integration · 2026-02-03T00:08:28Z

packages/llama-agents-dbos/src/llama_agents/dbos/state_store.py

+        async with self._lock:
+            state, commit_fn = await self._run_sync(_edit_with_lock)
+            try:
+                yield state
+                await self._run_sync(commit_fn, state)
+            except Exception:
+                raise


🔴 Database connection leak when exception raised in edit_state context

When an exception is raised inside the edit_state context manager, the database connection is never closed, causing a resource leak.

Click to expand

How the bug is triggered

_edit_with_lock() opens a connection and begins a transaction (lines 492-493)

It returns state, commit_fn where commit_fn contains the cleanup logic (trans.commit() and conn.close())

In the async context manager (lines 519-525):

state, commit_fn = await self._run_sync(_edit_with_lock) - connection is opened

yield state - user code runs

If user code raises an exception, execution jumps to line 524-525 which just re-raises

commit_fn is never called, so the connection is never closed

Actual vs Expected

Actual: When user code raises an exception:

async with store.edit_state() as state: state["value"] = "modified" raise ValueError("intentional error") # Connection leaks!

The connection remains open and the transaction is left hanging.

Expected: The connection should be rolled back and closed when an exception occurs.

Impact

Database connection pool exhaustion over time, especially in long-running applications with error conditions. This could cause the application to eventually fail to connect to the database.

Recommendation: Add rollback and connection cleanup in the exception handler:

async with self._lock: state, commit_fn = await self._run_sync(_edit_with_lock) try: yield state await self._run_sync(commit_fn, state) except Exception: # Need to rollback and close the connection def _rollback_and_close() -> None: try: trans.rollback() finally: conn.close() await self._run_sync(_rollback_and_close) raise

Alternatively, restructure to use a separate cleanup function that's always called, or capture conn and trans in a way that allows cleanup in the except block.

Was this helpful? React with 👍 or 👎 to provide feedback.

adrianlyjak force-pushed the adrian/dbos branch 3 times, most recently from f96df50 to 82c5546 Compare October 22, 2025 18:34

adrianlyjak marked this pull request as ready for review October 22, 2025 18:36

adrianlyjak force-pushed the adrian/dbos branch from 82c5546 to 3000120 Compare October 22, 2025 18:43

qianl15 reviewed Oct 23, 2025

View reviewed changes

packages/llama-index-workflows/src/workflows/plugins/dbos.py Outdated Show resolved Hide resolved

qianl15 reviewed Oct 23, 2025

View reviewed changes

packages/llama-index-workflows/src/workflows/plugins/dbos.py Outdated Show resolved Hide resolved

adrianlyjak force-pushed the adrian/context-refact branch 2 times, most recently from 070fba2 to 862274b Compare October 28, 2025 17:31

Base automatically changed from adrian/context-refact to main October 28, 2025 18:32

adrianlyjak force-pushed the adrian/dbos branch from 3000120 to 9ee38b4 Compare January 15, 2026 23:32

adrianlyjak force-pushed the adrian/dbos branch from 64bfd6d to ff3da31 Compare January 22, 2026 02:05

adrianlyjak changed the title ~~WIP: dbos integration with plugins~~ Runtime refactor to support pluggable runtimes Jan 22, 2026

devin-ai-integration bot reviewed Jan 22, 2026

View reviewed changes

adrianlyjak force-pushed the adrian/dbos branch 2 times, most recently from b231c6d to 2574a33 Compare January 22, 2026 22:15

devin-ai-integration bot reviewed Jan 22, 2026

View reviewed changes

adrianlyjak force-pushed the adrian/dbos branch 4 times, most recently from 2b2fe68 to e31a92d Compare January 29, 2026 20:13

adrianlyjak changed the base branch from main to adrian/runtime-split January 29, 2026 20:14

adrianlyjak changed the title ~~Runtime refactor to support pluggable runtimes~~ DBOS runtime Jan 29, 2026

adrianlyjak marked this pull request as draft January 29, 2026 20:14

adrianlyjak force-pushed the adrian/runtime-split branch from 3f995dc to 380caf0 Compare January 30, 2026 04:16

adrianlyjak force-pushed the adrian/dbos branch from e6ed713 to 3149d66 Compare January 30, 2026 21:01

adrianlyjak force-pushed the adrian/runtime-split branch from 7019925 to 9a025f9 Compare January 30, 2026 21:02

adrianlyjak force-pushed the adrian/dbos branch 2 times, most recently from bc769c2 to 1a9de11 Compare January 31, 2026 18:10

adrianlyjak force-pushed the adrian/ticky branch from 8c4d60a to 30278d7 Compare January 31, 2026 23:20

adrianlyjak force-pushed the adrian/dbos branch 3 times, most recently from 69da0b3 to 9a296b1 Compare February 1, 2026 03:28

adrianlyjak changed the base branch from adrian/ticky to adrian/store-interface February 1, 2026 03:38

adrianlyjak force-pushed the adrian/dbos branch 2 times, most recently from 53dab5b to ee2c407 Compare February 2, 2026 05:37

adrianlyjak force-pushed the adrian/store-interface branch 2 times, most recently from 6631af8 to 67c0fdc Compare February 2, 2026 20:22

adrianlyjak force-pushed the adrian/dbos branch 3 times, most recently from 250d793 to 35194c5 Compare February 2, 2026 22:00

adrianlyjak changed the base branch from adrian/store-interface to adrian/ticky February 2, 2026 22:56

devin-ai-integration bot reviewed Feb 3, 2026

View reviewed changes

adrianlyjak force-pushed the adrian/dbos branch from 35194c5 to 816069d Compare February 3, 2026 17:27

Base automatically changed from adrian/ticky to main February 3, 2026 17:50

adrianlyjak force-pushed the adrian/dbos branch from 816069d to 2439fff Compare February 3, 2026 17:50

adrianlyjak mentioned this pull request Feb 3, 2026

Support durable execution with DBOS #130

Closed

adrianlyjak added 4 commits February 3, 2026 17:12

Add dbos plugin

cacb37e

rename to llama-agents-dbos

d75d19f

test step writes

3ddeb6d

redo subprocess determinism tests

64861fc

adrianlyjak force-pushed the adrian/dbos branch from 5624529 to 64861fc Compare February 3, 2026 22:12

adrianlyjak added 4 commits February 3, 2026 19:09

clean up

7bd3ce9

clean up

a1157c9

Use a real table

d2d0180

Simplify sql and make more configurable

84395fc

Conversation

adrianlyjak commented Oct 21, 2025 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

changeset-bot bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coveralls commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 21671598072

Details

💛 - Coveralls

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

review-notebook-app bot commented Feb 2, 2026

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 3, 2026

Choose a reason for hiding this comment

How the bug is triggered

Actual vs Expected

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adrianlyjak commented Oct 21, 2025 •

edited by devin-ai-integration bot

Loading

changeset-bot bot commented Jan 15, 2026 •

edited

Loading

coveralls commented Jan 15, 2026 •

edited

Loading